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FOREWORD 


"N\ 

"  This  report  develops  a  comprehensive  theory  of  parallel  Kalman  filtering 
based  on  a  unique  decoupling  principle  permitting  the  predictor  and  corrector 
equations  in  the  filter  to  be  computed  in  parallel.  Highly  parallel  algorithms 
and  systolic  architectures  for  efficiently  implementing  these  advanced  filter¬ 
ing  techniques  are  presented.  The  application  of  these  methods  to  Strategic 
Defense  Initiative  (SDI)  target  tracking  problems  is  also  described.  — ' 7  o  r  £  ■  J 
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SECTION  1 


INTRODUCTION 


1.1  BACKGROUND 

In  navigation  equipment,  such  as  the  GPS,  a  Kalman  filter  acts  to  smooth 
data  when  the  unit  is  unaided,  or  as  an  estimating  filter  when  inertial  data 
are  accepted.  The  need  to  integrate  multiple  sensors  results  in  a  hybrid 
system  with  extremely  large  computational  requirements  for  real-time  applica¬ 
tions.  Often  the  hybrid  system  takes  the  form  of  a  "cascaded"  filter  to  ease 
the  computational  burden.  Sensor  data  integration  is  often  difficult  due  to 
correlating  the  time  of  the  measurements  and  the  different  measurement  rates 
of  the  sensors.  In  this  project,  parallel  Kalman  filter  architectures  are 
optimized  for  hybrid  systems  consisting  of  multiple  sensors  to  achieve 
improved  performance.  The  computational  advantage  of  parallel  processing 
minimizes  measurement  time  sensitivity  and  data  transfer  over  the  bus.  Thus, 
multi-rate  filtering  is  attainable  via  a  unique  decoupling  of  the  predictor 
and  corrector  equations  in  the  filter  while  maintaining  optimal  estimation. 

The  parallel  processing  techniques  can  be  applied  to  real-time  navigation, 
target  tracking  and  scene  analysis. 

With  recent  developments  in  advanced  sensors,  a  severe  load  has  been 
placed  upon  real-time  signal  processing  systems  to  process  large  amounts  of 
data.  Although  the  technology  for  implementing  advanced  sensors  already 
exists,  the  actual  implementation  depends  strongly  on  the  ability  to  develop 
real-time  signal  processing  hardware  to  process  the  data.  Thus,  to  meet  the 
exceptionally  high  throughput  requirements  in  DoD  signal  processing  applica¬ 
tions,  considerable  attention  must  be  given  to  the  development  of  highly 
parallel  (or  systolic)  signal  processing  architectures.  Because  signal 
processing  architectures  tend  to  be  problem  dependent  to  achieve  the  necessary 
computational  requirements,  this  project  develops  systolic  signal  processor 
architectures  which  are  well  suited  for  recursive  filtering,  target  tracking, 
image  processing  and  signal  processing.  The  parallel  architectures  are  based 
on  mapping  the  widely-used  Kalman  filter  equations  onto  a  generalized  systolic 


array  for  linear  and  nonlinear  parallel  computation.  Although  this  was 
originally  developed  by  Travassos  (1982)  for  linear  estimation  on  two  (2) 
processors,  the  extension  of  the  method  to  n  processors  is  the  thrust  of  this 
project. 

To  illustrate  the  need  to  extend  the  method  for  SDI  sensor  track  proces¬ 
sing  via  a  Kalman  filter  consider  the  simplest  case  for  linear  filtering 
(nonlinear  filtering  is  even  more  computationally  demanding).  The  total 
number  of  arithmetic  operations  that  must  be  computed  in  the  Kalman  filter 
algorithm  can  be  counted  and  multiplied  by  different  multiplier  and  adder 
speeds.  For  the  Kalman  filter  algorithm  given  in  Table  1,  the  total  number 
of  multiplications,  additions  and  divisions  is  given  by: 

(n*n  +  2n-l)  additions,  (2n*n  +  4n  +  l)  multiplications,  and  1  division. 


Table  1:  Standard  Kalman  Filter  Algorithm* 


* 

Kalman 

Gain 

%  ’  V+)H-I  t'fow’S  +  iF1 

Filter 

Update 

V+>  •  V->  +  \ 

*  t1  -  yy  y->  +  yk 

Covariance 

Update 

pk(+)  -  [I  -  K.A]  Pk(-) 

Measurement 

Update 

-  Vi  \<+> 

♦Note  that  when  scalar  measurements  are  processed,  the  inverse 
operation  reduces  to  a  division  operation. 


Therefore,  the  overall  execution  time  needed  to  update  the  Kalman  filter 
algorithm  in 

t  -  (n*n  +  2n  -  1)  t  +  (2n*n  +  4n  +  1)  t  +  t. 
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t  is  the  addition  time,  tffi  is  the  multiplication  time,  and  t^  is 
the  division  time. 

For  example,  if  t  *  125  nsec,  tffl  *  200  nsec  and  t^  =  1000  nsec,  one 
pass  through  the  Kalman  filter  with  n  =  9  states  requires  53  ysec.  This 
corresponds  to  1/0.53  ysec  *  18,868  updates  per  second  to  process  one  track 
using  state-of-the-art  32-bit  floating-point  VLSI  chips  assuming  100%  effici¬ 
ency.  Typically,  however,  only  10  to  30%  of  peak  performance  is  achieved  in 
practice.  Therefore,  one  target  track  may  be  updated  at  a  4,000  updates  per 
second  rate.  1000  targets  could  be  updated  at  a  4  Hz  rate  and  10,000  targets 
at  .4  Hz  rate  (every  2.5  seconds).  For  nonlinear  filtering,  typical  of  SDI 
target  tracking  problems  (see  Section  2) ,  64-bit  precision  and  the  need  to 
compute  trigonometric  functions  for  coordinate  transformations  can  slow 
computations  down  by  1  or  perhaps  2  orders  of  magnitude  (lOx  to  lOOx) .  Since 
it  is  well  known  that  the  Kalman  filtering  must  be  performed  using  floating¬ 
point  arithmetic  to  avoid  stability  problems,  the  only  viable  method  to  gain 
back  the  throughput  for  SDI  filtering  problems  using  an  extended  Kalman  filter 
is  with  parallel  processing.  Optical  processing  is  fast  but  optical  fixed- 
point  can  cause  stability  problems  with  the  Kalman  filter.  This  report, 
therefore,  extends  a  systolic  architecture  approach  for  rapidly  implementing 
parallel  Kalman  filters  with  32/64-bit  floating-point  electronic  technology 
for  several  SDI  applications. 

In  this  report,  systolic  array  concepts  are  used  to  develop  efficient 
architectures  for  implementing  the  bank  of  Kalman  filters  shown  in  Figure  1-1 
needed  by  the  recursive  maximum-likelihood  estimator. 

A  system  of  32x32  *  1024  processors  based  on  the  methods  in  this  report 
will  provide  32/64-bit,  IEEE  standard  computations  at  speeds  approaching  lOOOx 
faster  than  today's  technology.  Note  that  decoupling  the  problem  to  run  on 
several  processing  elements  is  required  not  lust  high-speed  arithmetic  units. 


1.2  SUMMARY  OF  THE  APPROACH 

The  Phase  I  plan  was  to  examine  the  feasibility  of  extending  the  2  proces¬ 
sor  parallel  Kalman  filter  algorithms  and  architectures  from  two  (2)  to  n  (even) 
processors.  To  show  feasibility,  the  method  was  extended  from  two  (2)  to  four 
(4)  processors  and  then  to  n  (even)  processors.  The  optimality  of  the  2 
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FIGURE  1-1  Maximum-Likelihood  Estimation  of  Multiple  Targets 

processor  case  was  also  researched.  Parallel  architectures  have  also  been 
developed  to  implement  the  parallel  Kalman  filter  algorithms.  A  summary  of 
the  Phase  I  technical  approach  is  given  in  Figure  1-2. 

Our  Phase  1  technical  approach  covered  the  following  areas: 

•  Defining  a  realistic  SDI  sensor  estimation  problem 

•  Examining  the  class  of  computations  required  for  solving  SDI 
target  tracking  problems 

•  Restructuring  the  Kalman  filter  equations  for  parallel  processing 

•  Expanding  previously-developed  systolic  Kalman  filter  algorithms 
and  architectures  from  2  processor  to  n  (even)  processors 

•  Investigating  the  feasibility  of  implementing  the  parallel  Kalman 
filter  architectures  in  hardware  taking  into  account  wordlength, 
sampling  rate,  memory  and  reliability  Issues 


X  > 


The  major  payoffs  of  the  research  included: 

o  Solving  SDI  problems  which  could  not  be  solved  otherwise  by 
providing  three  (3)  orders  of  magnitude  improvement  (lOOOx)  in 
computational  speed  over  existing  array  processor-based  Kalman 
filter  implementations 

o  Constructing  a  generic  Battle  Management  Testbed  Facility  that 
can  be  used  to  validate  new  parallel  Kalman  filter  algorithms 
as  they  become  available.  The  testbed  permitted  parallel  algo¬ 
rithms  and  architectures  to  be  rapidly  evaluated,  speeding  up 
the  deployment  of  new  designs  in  SDI  systems 

o  Computational  throughput  was  measured  explicitly  ,  rather  than 
estimated,  taking  into  account  processor-to-processor  and 
processor-to-memory  communications  in  the  testbed. 
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1.3  RESULTS 


The  major  results  of  the  Phase  I  feasibility  study  can  be  summarized  as 


follows : 


o  The  "optimality"  of  the  parallel  Kalman  filter  has  been  proven 
analytically  by  showing  that  the  dual  (2)  processor  parallel 
Kalman  filter  is  "mathematically  equivalent"  to  the  standard 
(sequential)  Kalman  filter.  The  dual  (2)  processor  parallel 
Kalman  filter,  however,  executes  twice  as  fast  as  the  standard 
filter  since  the  predictor  and  corrector  in  the  parallel  Kalman 
filter  can  be  computed  in  parallel. 


o  The  parallel  Kalman  filter  for  linear  estimation  problems  has 
been  extended  from  2  to  n  (even)  processors. 


o  Parallel  architectures  have  also  been  developed  to  rapidly 
implement  the  linear  parallel  Kalman  filter  methods. 


o  The  wordlength,  memory  and  VLSI  arithmetic  processor  selections 
have  been  examined  and  documented. 


o  In  addition  to  the  above,  the  linear  parallel  Kalman  filter  has 
been  extended  for  nonlinear  track  processing.  The  extended, 
nonlinear  parallel  Kalman  filter  equations  have  been  developed 
for  the  2  (and  4)  processor  case  under  Phase  I.  Under  Phase  II, 
the  nonlinear  extended  parallel  Kalman  filter  can  be  further 
decoupled  and  generalized  for  n  (even)  processors. 


1 . 4  REPORT  SUMMARY 


The  remainder  of  this  report  is  organized  as  follows.  The  decoupling  of 
the  parallel  Kalman  filter  (PKF)  for  linear  estimation  is  presented  in  Section 
2.  The  optimality  of  the  2  processor  linear  PKF  is  proven  analytically  in 
Section  2.  The  generalization  of  the  two  (2)  processor  linear  PKF  equations 
to  execute  on  n  (even)  processors  is  also  developed  in  Section  2.  Nonlinear 
Kalman  filtering  via  an  extended  parallel  Kalman  filter  is  presented  in 
Section  3.  Parallel  architectures  for  efficiently  implementing  the  PFK 
equations  can  be  found  in  Sections  2  and  3.  The  wordlength,  memory  sizing 
and  VLSI  arithmetic  processor  selection  is  given  in  Section  4.  The  SDI  target 
tracking  problem  is  formulated  in  Section  5  for  demonstrating  the  utility  of 
the  PKF  methods.  Conclusions  and  recommendations  for  future  work  are  pre¬ 


sented  in  Section  6. 
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SECTION  2 


LINEAR  PARALLEL  KALMAN  FILTER  ALGORITHMS  AND 
ARCHITECTURES  BASED  ON  THE  DECOUPLING  PRINCIPLE 


PiJ, 

W| 


The  Kalman  filter  has  been  successfully  applied  to  many  signal  processing 

applications  including  target  prediction,  target  tracking,  radar  signal 

processing,  on-board  calibration  of  inter tial  systems  and  in-flight  estimation 

of  aircraft  stability  and  control  derivatives.  The  applicability  of  the 

Kalman  filter  to  real-time  processing  problems  is  generally  limited,  however, 

by  the  filter's  relative  computational  complexity.  In  particular,  the  number 

of  arithmetic  operations  required  for  implementing  the  Kalman  filter  with  a 

2  3 

state  variable  grcws  as  0(n  )  for  the  time  update  and  as  0(n  )  for  the 
covariance  update.  In  general,  real-time  filtering  cannot  be  performed  on 
large  scale  problems  using  a  uniprocessor  architecture  because  serious 
processing  lags  can  result  (9) . 

The  Kalman  filter  can  be  extended  to  a  much  greater  class  of  problems  by 
using  parallel  processing  concepts.  Full  utilization  of  parallelism  can  be 
obtained  through  insight  in  the  structure  of  the  problem  and  decoupling  of 
arithmetic  processes  to  permit  concurrent  processing. 

To  speed  up  Kalman  filter  computations,  parallel  processing  is  performed 


at  two  levels:  (1)  the  predictor  and  corrector  equations  of  the  Kalman  filter 


are  decoupled  so  that  the  predictor  and  corrector  can  be  computed  on  separate 


rocessors.  and  (2)  the  measurement  data  are  pipelined  into  each  processor. 


Therefore,  both  multiprocessing  and  pipelining  are  considered  to  achieve  large 


Improvements  in  computational  speed. 


I 
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2.1  DECOUPLING  THE  TIME  AND  MEASUREMENT  UPDATE 

The  Kalman  filter  equations  in  Table  1-1  can  be  written  in  predictor- 
corrector  form  as  follows: 


Predictor 


xk<->  ■  »nViw 


pk<->  ■  ♦k-ipk(+)*k-i 


(2.1) 


(2.2) 


I 

i 


1  >  : 


Corrector 


xk(+)  ■  *k(->  +  *k<*k  -  Vk*-” 

Pkw  -  (I  -  KkBk)  Pk<-) 


where  the  Kalman  gain  is 

“k  ■  pk<->  ^ <Vk<->  Hk  +  V 

and  <(>^  *  state  transition  matrix. 


(2.3) 

(2.4) 


(2.5) 

(2.6) 


Note  that  Eqs.  (2.1)  to  (2.5)  are  inherently  sequential  since  the  temporal 
updates  (predictor  equations)  must  be  evaluated  before  the  observation  updates 
(corrector  equations).  From  a  computational  point  of  view,  this  is  not 
desirable  since  to  evaluate  the  corrector  a  uniprocessor  must  wait  until  the 
predictor  has  been  evaluated.  To  avoid  this  difficulty,  the  predictor-corrector 
equations  can  be  decoupled  to  obtain  a  parallel  Kalman  filter  (PKF) . 


2.2  DECOUPLING  THE  STATE  UPDATE  (Travassos,  1982) 

The  decoupling  of  the  predictor  and  corrector  is  achieved  by  forcing  the 
corrector  to  lag  the  predictor  by  one  time  step  as  follows: 

Predictor:  xk+l(_)  “  *k*k-lxk-l  »  (2.7) 

Corrector:  xfc(+)  “  xk^  +  “  Hkxk^-^  *  (2.8) 


2.3  DECOUPLING  THE  COVARIANCE  UPDATE  (Travassos,  1982) 


Let  the  covariance  of  the  estimation  error  before  and  after  a  measurement 
update  be  denoted  by: 

pk«<-)  '  ;k+i<->  •  (2-9> 

Pk(+)  -  Exk(+)  ij(+)  ,  (2.10) 

where 


xk+i(-}  4  W->  -  Xk+1  * 

(2.11) 

*k(+)  A  xk(+)  -  xk  . 

(2.12) 

By  direct  computation,  it  can  be  shown  that  the  covariance  of 

error  before  the  update  is  given  by: 

the  estimation 

Pk+1(_)  “Mk-l(+)  Pk-l^k-l^k 

(2.13) 

Because  the  form  of  Eq.  (2.8)  is  the  same  as  Eq.  (2.9),  the  covariance  of  the 


estimation  error  after  a  measurement  update  in  the  PKF  is  given  by: 


Pk<+)  "  (I  -  W  Pk(-)  ,  (2.14) 

where 

\  ■  pk(->  <<Hkpk(->  <  +  V'1  (2-15) 

2.4  SUMMARY  OF  THE  DECOUPLED  PKF  EQUATIONS  FOR  TWO  PROCESSORS  (Travassos,  1982) 

Because  the  predictor  and  corrector  equations  in  the  Kalman  filter  can  be 
decoupled,  computations  can  be  performed  simultaneously  on  two  separate 
processors,  one  processor  for  the  predictor  equations  and  one  processor  for 
the  corrector  equations.  In  summary,  the  parallel  Kalman  filter  (PKF) 


equations  are: 

Predictor  I 

|  W-)  =  Vk-A-i(+) 

T  T 

(2.16) 

1 

!  pw(-''-'k*i-ipk-iw*i,-i'i 

(2.17) 

Corrector  j 

i  V-+)  *  5k(-> +  Vzk  -  Hkik(-)> 

(2.18) 

1  pk<+>  -  « -  w  v-> 

(2.19) 

Kalman  1 

Gain  1 

!  ^  -  pk<->  Hk<"kpk(->  Hk  +  V1 

(2.20) 

Although  these  results  can  be  applied  to  a  "linearized"  model  of  the 
sensor  track  processing  problem,  we  propose  to  extend  these  results  to  work 
with  the  target's  nonlinear  equations  of  motion  directly.  Using  the  nonlinear 
equations  of  motion  will  provide  more  accurate  tracking  (see  Section  3.1)  for 
the  nonlinear  parallel  Kalman  filter  equations).  Both  linear  and  nonlinear 
parallel  Kalman  filter  techniques  can  be  applied. 

2.5  OPTIMALITY  OF  THE  PARALLEL  KALMAN  FILTER 

To  show  that  the  PKF  is  optimal  (in  the  same  sense  as  the  standard  Kalman 
filter),  the  optimality  proof  will  be  separated  into  two  parts.  First  we 
shall  show  that  the  PKF  generates  the  same  state  update  as  the  SKF.  Secondly, 
the  covariance  update  in  the  PKF  is  shown  to  be  equivalent  to  the  covariance 
update  in  the  PKF.  Combining  these  results  shows  that  the  PKF  is  optimal 
since  both  the  state  and  covariance  updates  are  "mathematically  equivalent"  to 


For  the  SKF,  substituting  Eq.  (2.9)  into  Eq.  (2.7)  evaluated  at  time 
k+1 ,  we  have : 


ik+l(-)  '  +kik<->  +  ♦kV'k  -  Hk*k<">) 

where 

\  ‘  pk(->  “k'Vk'-1  Hk +  V1 

Note  that  the  innovations  sequence  is  given  by: 

zk/k+l  “  (zk  ~  Hkxk(-):>  k  “  0*  1 »  2*  •** 
Substituting  Eq.  (2.7)  back  into  Eq.  (2.21),  we  have 

“  Vk-i*k-i(+)  +  Wzk  -  Hkxk(-}) 


(2.21) 


(2.22) 


(2.23) 


(2.24) 


transition 

update 


innovations 

update 


Thus,  the  state  update  in  the  SKF  consists  of  two  parts  -  the  transition 
update  and  innovations  update.  Our  objective  was  to  show  that  the  PKF  state 
update  is  equivalent  to  Eq.  (2.24).  To  do  this,  we  needed  to  prove  the 
following  lemma: 

Lemma  #1 


For  the  PKF  defined  by  Eqs.  (2.16)  to  (2.20), 
xk<+)  ■  *k-ixk-i(+) 


Vi(-> 


Vi(+> 


AKD  *kM  .  *k.1xk.1(-) 

PROOF :  Evaluating  Eq.  (2.16)  at  time  k,  we  have 

ik(->  -  +k-l*k-2\-2<+) 


(2.25) 

(2.26) 


(2.27) 


(2.28) 


But  iji^  2xk-2^+^  “  xk-l^  ^y  Eq*  (2.25)  evaluated  at  time  k— 1 . 
Evaluating  Eq.  (2.28)  at  time  k-1  results  in 

Xk-1 “  ^k-2xk-2^+^  “  xk_l(+)  { 


by  virtue  of  Eq.  (2.25).  Thus, 


Xk-1(_)  “  xk-l(+)  * 


(2.29) 


(2.30) 
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The  next  part  of  the  proof  is  to  show  that  xk(-)  *  $k  ^  (-)  in 
the  PKF.  Evaluating  Eq.  (2.16)  at  time  k+1 ,  we  have: 


Vi(->  ■  Vk-Isk-1 

or  at  time  k 


(+) 


Vk(+) 


(2.31) 


xk(-> 


*k-lxk-l(+) 


But  by  Eq.  (2-20)  we  can  conclude  the  proof  as  follows: 
*k(-)  ■  »k-isk-i(-> 


(2.32) 


(2.33) 


Now  we  can  use  the  results  of  Lemma  #1  to  show  that  the  PKF  state  update  is 
"mathematically  equivalent"  to  the  SKF  state  update. 


Theorem  1 : 


Because  the  transition  state  update 

Vk-ixk-i(+)  =  Vk-A-i^ 


(2.34) 


and  innovations  state  update 

<t>k-lKk-l(zk-l  "  Hk-lxk-l(+))  “  S-^k  '  Hkxk(_))  (2.35) 


Proof : 


are  mathematically  equivalent,  the  PKF  and  SKF  state  updates  are 
mathematically  equivalent. 

The  transition  update  is  easy  using  Lemma  #1,  Eq.  (2.26)  since: 

♦kVi^k-i^  *  Vk-ixk-i(+) 


(2.36) 


Pre-multiplying  the  innovations  in  the  PKF  state  update,  we  have: 


^k-l^-l^k-l  “  Hk-lxk-l(+))  =  *k-l(xk-lW  "  xk-l(_))  (2.37) 

by  Eq.  (2.18)  evaluated  at  time  k-1.  Expanding  Eq.  (2.37)  results  in 


-  Vixk-i(+)  -  ^k-ixk-i(-} 

-  xk(+)  -  xk(-) 


(2.38) 

(2.39) 


by  Eqs.  (2.25)  and  (2.27)  in  Lemma  #1.  But  xk(+)  -  xk(-)  * 
K^(zk  -  Hkxk(-))  using  Eq.  (2.18)  in  the  PKF.  Thus,  it  can  be 
concluded  that 


^k-l^k-1  ^zk-l  “  Hk-lxk-l(+))  "  ^^k  “  Hkxk(_)) 


■  (2.40) 


as  desired 


Now  that  we  have  shown  that  the  PKF  and  SKF  state  updates  are  "mathematically 
equivalent."  we  turn  our  attention  to  the  covariance  update. 

The  Covariance  Update: 

From  Eqs.  (2.7)  to  (2.11),  the  covariance  update  in  the  SKF  can  be 
summarized  as  follows: 


pk(->  -  +k-ipk-i<+)*k-i 

V+)  ‘  (I  *  W  pk(->  *  pk<->  -  Wk<-> 


(2.41) 

(2.42) 


Replacing  P^  ^ (+)  in  Eq.  (2.41  evaluated  at  time  k+1  with  Eq.  (2.42)  we 
have: 


pk+i<+)  ■  ♦kpk<->  *1  -  *kWkw  *k 


(2.43) 


Now  substituting  Eq.  (2.41)  evaluated  at  time  k  into  Eq.  (2.  )  results  in 


Pk+L<+>  *  »k*k-lPk-l(+>  Cl»k  -  ♦kWk<->  *k 


(2.44) 


transition  update 


gain  update 


which  summarizes  the  complete  SKF  covariance  update.  As  before,  our  objective 
is  to  show  that  the  PKF  covariance  update  is  "mathematically  equivalent"  to 


Eq.  (2,44).  Another  lemma  is  useful  in  proving  this  result. 
Lemma  #2 


g 


For  the  PKF  defined  by  Eqs.  (2.16)  to  (2.20), 
pk(+>  •  ♦k-ipk-i<+>  Ci 


THEN 


Pk-l<+) 


AND  Pk<->  -  ♦k_iPk.l<+>  *k_! 

PROOF :  Evaluating  Eq.  (2.17)  at  time  k,  we  have 
Pk(_)  "  ♦k-l*k-2Pk-2(+)  <(,k-2<J,k-l 


(2.45) 

(2.46) 


(2.47) 


(2.48) 


But  by  Eq.  (2.45)  evaluated  at  time  k-1,  Eq.  (2.48)  can  be  written 
as  follows: 


Pk^“)  "  ^k-lPk-l ^  ^k-1  which  proves  Eq.  (2.47) 
Evaluating  Eq.  (2.49)  at  time  k-1,  results  in 

pk_l<-)  “  <(>k-2Pk-2(+:>  ^k-2  ‘  Pk-l(+) 


(2.49) 


(2.50) 


using  Eq.  (2.45)  evaluated  at  time  k-1.  Hence,  P^_j (-)  =  ^ (+) 

as  desired. 

Now  we  can  use  the  results  of  Lemma  It 2  to  show  that  the  PKF  covariance  update 
is  "mathematically  equivalent"  to  the  SKF  covariance  update. 

Theorem  it  2: 

Because  the  transition  covariance  update 

Mk-l^-l^  *k-l*k  "  Vk-lPk-l(+)  ^k-l^k 
and  the  gain  covariance  update 

■  Wk<-> 

are  mathematically  equivalent ,  the  PKF  and  SKF  covarinace  updates  are 
mathematically  equivalent. 

PROOF:  To  begin,  substitute  the  PKF  covariance  update  Eq.  (2,19)  evaluated 
at  time  k-1  into  Eq.  (2.17)  to  obtain: 

W->  '  -  Kk-A-i)  pk-i(-)  *Li*k 

■  Mk-ipk-i(->  *k-i*k  -  *k<*k-iKk-A-ipk-i(->  ♦Li)  *k  <2-5I> 


transition  covariance 

Note  that  if  we  can  prove  that 
T  ,T 


gain  covariance 


$  ^  ^  P^_  ^  (")  $  |^_ 
and 


*k*k-lPk-l(+)  ^k-l^k 


(2.52) 


Ok-lKk-lWk-l^  'f’k-l  "  Wk(">  (2*53> 

then  the  PKF  update  in  Eq.  (2.51)  will  be  the  same  (i.e.,  equivalent) 
to  the  SKF  update  in  Eq.  (2.44).  Therefore,  the  PKF  would  be  optimal 
since  both  the  state  and  covariance  updates  of  PKF  are  ''mathematically 
equivalent"  to  the  SKF.  Eq.  (2.52)  is  easy  to  prove  due  to  Eq.  (2.46) 
in  Lemma  H2.  Now  to  prove  Eq.  (2.53).  To  do  this,  note  from  the  PKF 
Eq.  (2.19)  evaluated  at  time  k-1  we  can  write: 


VlVlVlW  “  Pk-1(_)  "  Pk-l(+) 


(2.54) 


Now  pre-multiply  both  sides  of  Eq.  (2.54)  j  and  post-multiply  the 
T 

same  by  ^ 


to  obtain: 


<t>k-lKk-lHk-lPk-l(_)  *k-l 


(2.55) 


=  ♦k-l<Pk-l(->  '  Pk-l(+))  *k-l  (2‘55) 

"  4’k-lPk-l(_)  ^k-1  "  *k-lPk-l(+)  *k-l  (2,56) 

-  Pk(-)  -  Pk(+)  -  KkHkPk(_)  (2‘57) 

from  Lemma  // 2  and  Eq.  (2.19)  of  the  PKF.  Therefore,  we  have  shown 
that 


♦k-A-i\-ipk-i<->  *k-i  -  Wk<-> 


(2.58) 


Summary 

This  section  demonstrates  the  parallel  Kalman  filter,  originally 
developed  by  Travassos  (1985),  based  upon  decoupling  the  filter’s  predictor 
and  corrector  equations,  is  optimal  in  the  same  sense  as  the  standard  Kalman 
filter.  This  was  proven  via  two  lemmas  and  two  theorems.  Since  the  PKF 
method  can  be  extended  to  more  than  two  processors,  it  is  anticipated  that 
the  optimality  proof  can  be  extended  as  well.  Furthermore,  extending  the 
results  to  nonlinear  filtering  is  also  anticipated  to  be  successful. 


2.6  GENERALIZATION  TO  n  (EVEN)  PROCESSORS 

Now  that  the  stability  and  convergence  of  the  two-processor  parallel 
Kalman  filter  (PKF)  has  been  analytically  investigated  and  shown  to  be 
"mathematically  equivalent"  to  the  standard  Kalman  filter  (SKF) ,  we  now  focus 
our  efforts  on  extending  the  PKF  algorithm  to  run  on  n  (even)  processors.  Our 
approach  is  based  on  the  principle  of  induction;  i.e.,  we  develop  the  two- 
processor  then  the  four-processor  case  and  then  by  induction  generalize  the 
method  for  n  (even)  processors. 

To  provide  a  more  accurate  solution,  the  generalized  method  is  based  on 

the  trapezoidal  rule  of  numerical  integration  rather  than  Euler  integration 

which  has  been  used  to  date.  The  Kalman  filter  update  is  then  accurate  to 
2 

0(h  )  rather  than  0(h).  The  additional  accuracy  is  important  because  it  is 

anticipated  that  the  integration  step  size,  h,  will  be  large  due  to  the 

computational  complexity  of  the  Kalman  filter  equations.  For  example,  with  a 

sample  rate  of  100  Hz,  the  Euler-integration-based  PKF  would  be  accurate  to 

0(h)  ■  0.01,  while  the  trapezoidal-rule-based  PKF  would  be  accurate  to 
2 

0(h  )  •  0.0001  (i.e.,  as  accurate  as  the  12-bit  sensor  data). 
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2.6.1  PKF  Based  on  the  Trapezoidal  Rule  (Two  Processors) 

The  trapezoidal  rule  for  integrating  a  set  of  ordinary  differential 
equations  is  given  by: 

xk+l  *  *k  +  **  h(f(xk,tk)  +  f (xk+l,t:k+l^  (2.59) 

where  x  is  the  solution  of  the  ode,  f(x,t)  is  the  right-hand-side  (RHS)  of 

the  initial  value  problem  and  h  *  tic+i  ~  tk  is  tbe  integration  step  size. 

The  trapezoidal  rule  is  an  implicit  method  since  appears  implicitly  on 

the  RHS  of  Eq.  (2.59).  Note  that  f(x,t)  can  be,  in  general,  a  nonlinear 

function  or  linear  such  as  f(x,t)  *  Fx(t).  To  solve  Eq.  (2.59),  a  predictor 

is  needed  of  the  form  below  to  estimate  x.  , . . 

k+1 

Xk+1  -  xk  +  hf (vV  (2-60) 

Hence,  combining  Eqs.  (2.59)  and  (2.60),  we  obtain  a  predictor-corrector 
method  based  on  the  trapezoidal  rule: 


Predictor: 

xp 

k+1 

-  *k 

+  hf(\»V 

(2.61) 

Corrector: 

xc 

k+1 

•< 

+  h  h(f(xk,tk)  +  f (xk+i»tk+l)) 

(2.62) 

Note  that  the  predictor  must  be  evaluated  before  the  corrector  equation  can  be 
computed.  A  parallel  predictor-corrector  (PPC)  method  has  been  derived  by 
Miranker  (1967)  that  allows  the  predictor  and  corrector  to  be  evaluated 
simultaneously  on  two  (2)  processors  as  follows: 


Parallel  Trapezoidal  Rule  (Two  Processors) : 


Predictor: 

xp 

xk+l 

■  Vi  +  2h£k 

(2.63) 

Corrector : 

c 

xk  * 

4-i +  w>«E +  £k-i> 

(2.64) 

where  fp  -  f(xp,k) 

and 

4-i  ■  £<4-i>k-1)- 

In  the  special  case  when  fp  *  ^kxk^  is  t*ie  ^HS  t*ie  Kalman  filter 
state  update  before  a  measurement  and  Gp  * 

RHS  of  the  covariance  update  before  a  measurement,  then  the  two  (2)  processor 
parallel  Kalman  filter  can  be  derived  as  follows: 

Parallel  Kalman  Filter  Based  on  Trapezoidal  Rule  (Two  Processors) 
Predictor:  xk+1  (“)  -  xk_)(+)  +  2h<t>kxk(-) 


♦k(I-KkHk)Pk(-)(I-KkHk)‘*‘  is  the 


(2.65) 


where  g£  -  ♦k(I  -  +  Pk(-)U  -  \\)T  *k 


(2.67) 


Corrector:  xfc(+)  •xk_1(+)  +  h/2(<)>kxk(-)  +  4>k_1xk_1  (+) )  +  Kic(zk-Hkxic(-))  (2.68) 
Pk(+)  "  Pk-l(+)  +  h/2(Gk  +  Gfc-i)  "  KkHkPk(')  (2.69) 


where  g£  -  ♦k(I  -  KjH^P^-)  +  Pk(-)(I  -  \\)  *k 


Gk-1  =  *k-lPk-l(+)  +  Pk-l(+)  *k-l 


(2.70) 

(2.71) 


In  the  above  PKF,  the  (-)  notation  represents  a  value  before  a  measurement 
update  and  the  (+)  notation  is  a  value  after  a  measurement  update.  Similarly, 
the  p  for  predictor  corresponds  to  the  (-)  notation  and  the  c  for  corrector 
value  corresponds  to  the  (+)  notation. 


2.6.2  PKF  Based  on  the  Trapezoidal  Rule  (Four  Processors) 

With  four  (4)  processors,  the  parallel  trapezoidal  rule  is  given  by: 
Parallel  Trapezoidal  Rule  (Four  Processors: 


In  the  case  where 


p2k  2k 


xp 

2k+2 

c 

“  X2k-2 

+  4hf2k 

X2k+1 

“  X2k-2 

+  3/2  h(f : 

X2k  " 

xC  - 

2k-3 

h/2  (3f 2k 

xc 

2k-l 

c 

*  x2k-3 

+  2hf2k_2 

where 

fp  = 

2k 

f(x2k*2k)» 

and 

fc 

2k-2 

f(x2k-2’ 

-)  , 

c2k-l(“ 

)  and 

P  +  fP  ' 
2k  +  r2k-T 


(2.72) 

(2.73) 

(2.74) 

(2.75) 


f2k-2  "  *2k-2  *2k-2(+) 


(2.76) 

(2.77) 

(2.78) 


are  the  KHS  of  the  state  update  in  the  Kalman  filter  and 


G2k  "  *2k(I  "  K2kH2k*  P2k(-)  +  P2k(_)  (I  “  K2kH2k)  *2k 


(2.79) 


G2k-1  "  *2k-l(I~K2k-lH2k-l)P2k-l(  )  +  P2k-1 (I_K2k-lH2k-l )  <f)2k-l  (2.80) 


and  G. 


*2k-2  P2k-2(+)  +  P2k-2(+)  *2k-2 


(2.81) 
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Parallel  Kalman  Filter  Based  on  Trapezoidal  Rule  (Four  Processors) 
Predictor: 


X2k+2(_) 


X2k-2(+)  +  4h  <t’2kX2k(_) 


X2k+l<-) 


X2k-2(+)  +  3h/2  (<J>2kx2k(_)  +  4>2k-lx2k-l ) 


P2k+2(-) 


P2k-2(+)  +  4hG2k 


P2k+1 


P2k-2 (+)  +  3h/2  <G2k  +  G2k-1> 


where  G2^  05  ij> 0L, (I  -  K0VH0U)  Pou(-)  +  P01> (-)  (I-K0,^HOU) 


T  T 


’2k-l 


-  $ 


2k- 1 


(I-K 


1  H„ 


P2k-l(-) (1  ”  K2k-1^ 


2k-l  2k-l '  2k-l 
T  1  T 
^  2k-l 


(-)  + 


Corrector: 


X2k(+) 


X2k-3(+)  “  h/2  (3*2kx2k^  "  9  <f)2k-lX2k-l (  ^  + 


K2k(z2k  "  H2kx2k(-)) 


X2k-l(+)  =  X2k-3(+)  +  2h  <*,2k-2X2k-2(+')  + 


K2k-l(z2k-l  H2k-1(  )  x2k-l( 


p2k(+)  *  P2k-3(+)  "  h/2  (3G2k  "  9G2k-l)  +  (I_K2kH2k)P2k(_) 


P2k-1 (+) 


P2k-3(+)  +  2hG2k-2  +  (I  "K2k-lH2k-l)P2k-l(-) 


where  G 


2k 


and  G 


2k- 1 


are  defined  above  and 


2k- 1 
„c 


’2k-lP2k-l(+)  +  P2k-l(+)  ^ 2k- 1 


2k-2 


’2k-2P2k-2(+)  +  P2k-2(+)  <t>2k-2 


2.6.3  PKF  Based  on  Trapezoidal  Rule  (n  (even)  Processors 


(2.83) 

(2.84) 

(2.85) 

(2.86) 


(2.87) 


(2.88) 


(2.89) 

(2.90) 

(2.91) 


(2.92) 

(2.93) 


In  the  previous  sections,  the  PKF  equations  were  derived  for  the  two 

g 

processor  and  four  processor  cases.  In  general,  n  -  2  (even)  processors 

S~1 

are  needed  to  implement  the  PKF  equations.  If  we  let  m  -  2  and  q  *  m-1, 
the  PKF  equations  can  be  generalized  to  run  on  n  processors.  The  RHS  defini¬ 
tions  for  the  generalized  PKF  equations  are  defined  as  follows. 


(2.96) 


BB 


mk-m 

fC. 

mk-m 


$  ,  x  .  (-) 

mk-m  mk-m 

$  ,  x  ,  (+) 

mk-m  mk-m 


(2.97) 


With  these  definitions,  the  PKF  for  n  (even)  processors  are  summarized  in 
Eqs.  (2.98)  to  (2.115)  on  the  following  pages. 


2.7  SUMMARY 

Although  it  was  shown  that  the  PKF  equations  can  be  generalized  for  n 
(even)  processors,  the  following  recommendation  is  made  based  on  our  experi¬ 
ence  with  parallel  computing.  It  is  recommended  that  the  four  processor  PKF 
equations  be  used  to  propagate  each  state  (or  covariance)  equation  separately 
in  a  nine-state  target  tracking  application.  With  four  processors  per  state, 
a  4x  speed-up  is  possible  per  state  equation  update.  With  four  processors  on 
a  card,  the  nine-state  filter  could  be  implemented  with  nine  cards  or 
9x4  ■  36  processing  elements.  This  approach  may  be  computationally  easier  to 
manage  and  provide  a  9x4  ■  36x  speed-up  compared  with  partitioning  the 
original  PKF  equations  to  run  on  36  processors.  The  main  concern  is  that 
when  the  large  number  of  algebraic  terms  that  arise  from  the  decoupling  are 
computed,  human  error  in  the  algebraic  partitioning  could  lead  to  inaccurate 
tracking.  The  partitioning  can  be  automated  by  a  computer  to  minimize  this 
potential  problem.  In  addition,  it  is  well  known  that  beyond  say  32  proces¬ 
sors,  the  benefit  of  parallel  processing  is  less  than  linear  and  adding  more 
processors  may  not  speed  up  overall  computations  that  much.  Hence,  using 
four  processors  per  state  (or  covariance)  equation  provides  maximum  benefit, 
ease  of  partitioning  and  reduces  the  potential  for  human  error  in  the 
decoupling. 
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Parallel  Kalman  Filter  Based  on  Trapezoidal  Rule 
Generalization  to  n  (even)  Processors 
State  Predictor  Equations; 


Xmk+n/~^ 

x  , 
mk-m 

Xmk+m-l 

i 

r 

Xmk+m-2^ 

■  *«*. 

Xmk+m-3^~^ 

"  Xmk- 
• 

• 

• 

• 

• 

x  _  (-) 

mk+m-q 

A 

*  X  , 
mk* 

mk 


(2.98) 

(2.99) 

(2.100) 

+  fp 

•  mk-m) (2.101) 


»  +  +  f^-1  +  f»k-2  +  •  •  •  +  £"  .  -> 


mk-m 


(2.102) 


State  Corrector  Equations: 


*mk<+>  “ 

Vi(+) 

5mk-2<+> 


mk-n+1 


(+)  +  (n-q)/2h(f£ 


mk-m+q 


+  . . .  +  f 


+  fH 

mk-m+2  mk-m+1 


mk-n+1 

^mk-n+1 


(+)  +  (n-2)/2h(fP  __  +  £.  +fP  _) 


x_,  _  .,(+)  +  (n-1)  /  2h  (fP  ,  +  fPk_m) 


mk-m+2 

^mk-m+1 


mk-m+1  mk-m 


+  fmk-m)(2-103) 

(2.104) 

(2.105) 


x  .  (+)  -  x.  _,_.(+) +n/2h(3qfP,  +  q33qfP,  +  fc,  )  (2.106) 

mk-q  mk-n+1  mk  ^  mk-m  mk-m 

Definitions 

For  the  parallel  predictor-corrector  generalization,  define  the 
following  variables: 

g 

n  ■  the  number  of  parallel  processing  elements  -  2  (i.e.,  a  power  of  2) 

8-1 


m  -  2 


the  number  of  processors  used  by  the  predictor  (or  corrector) 


q  »  m-1  •  a  convenient  definition 


Parallel  Kalman  Filter  Based  on  Trapezoidal  Rule 
Generalization  to  n  (even)  Processors 


Covariance  Predictor  Equations: 


-  W+>  +  "“Ik 


(2.107) 

(2.108) 


'  W+)  +  ("-1>'21'  “4  +  gL-i>  «-108> 

P.k«-2W  -  W»<+>  +  <"-2>'2h  <4  +  4-1  +  4-2>  <2-109> 

P*«-3W  ‘  P»k-»(+)  +  ("-3)/2h  <4  +  Gmk-1  +  Gmk-2  +  -  +  4-„>(2-U0> 


'  Pmk-B<+)  +  <4  +  4-1  +  4-2  +  •"  +  4-»)(2'1U> 


Covariance  Corrector  Equations: 


?_,.<+>  “  P. 


mk-n+1 


(+)  +  (n-q)/2h  (GP 


mk-m+q 


+  . . .  +  G 


mk-m+2  mk-m+1  mk-m 


+  GP  )  (2. 112) 


(+)  -  P 


mk-n+1 


(+)  +  (n-2)  /2h  (GP  _  +  GP 


|H  +  GF  .  \ 

mk-m+2  mk-m+1  mk-nr 


P  .  .(+)  -  P  .  _,,(+)  +  (n-1) /2h  (GP  .  +  GP,  ) 

mk-2  mk-n+1  mk-m+1  mk-m 


(2.113) 


(2.114) 


P  u  (+) 

mk-q 


mk-n+1 


<+)  +  n/2h  <3V.  +  ,33’cp  +  c'  .) 


(2.115) 


SECTION  3 


NONLINEAR  EXTENDED  PARALLEL  KALMAN  FILTER  ALGORITHMS 
AND  ARCHITECTURES  BASED  ON  THE  DECOUPLING  PRINCIPLE 


3.1  BACKGROUND 

The  SDI  sensor  processing  problem  is  nonlinear  due  to  coordinate  trans¬ 
formations  and  angle-only  measurements.  The  SDI  target  math  model  can  be 
summarized  as  follows  for  the  extended  Kalman  filter: 

x(t)  -  f(x(t),t)  +  w ( t )  0  1  t  1  T 

*  h(x(tk))  +  vk  k  =  1,  2,  3 . N 

The  well-known  extended  Kalman  filter  for  a  sequential  computer  is  given  by 
(Gelb ,  1974): 


Predictor: 

x(t)  -  f (x (t) )  x(tQ)  *  xo 

P(t)  -  F (x (t) )  P(t)  +  P(t)  FT(x (t) )  +  Q(t)  ,  P(tQ)  =  PQ 


Corrector: 


*k(+) 

Pk(+> 

*k 

where 


x(-)  +  ^(Zj^  -  \(xk(-))) 


(I  -  wv-»>  pk(-> 

pk(_)  Hk(ik(_))  *  (Hk(ik(_))  pk(_)  +  V 


F(x(t) ) 
\(x(-)) 


9f (x(t) ) 
3x(t) 

3h(x(tk)) 

3x(tk) 


x (t)  -  i(t> 

x(tk)  -  X (— ) 


3.2  DECOUPLING  THE  EXTENDED  KALMAN  FILTER  STATE  UPDATE 

With  two  processors,  the  state  predictor  and  corrector  can  be  decoupled 
by  forcing  the  corrector  to  lag  the  predictor  by  one  time  step.  If  a  parallel 
version  of  the  Trapezoidal  Rule  (Miranker,  1967)  is  used  to  integrate  the 


nonlinear  state  model  equations,  the  extended  Kalman  filter  state  predictor 

2 

and  corrector  can  be  computed  as  follows  with  accuracy  of  0(h). 

Predictor: 

Xk+1  <~>  ”  Xk-l(+)  +  2h£k 

where  h  is  the  integration  step  size  and 
£k  - 

Corrector: 

*k(+)  *  *k-i<+)  +  h/2<£k +  £k-i>  +  -  1><>‘k<-))) 

where  -  £<Vl <+)  •  W 

With  two  (2)  processors  the  state  predictor  and  corrector  can  be  computed 
using  a  4th  order  Parallel  Adams-Moulton  Method  as  follows  with  O(h^) 
accuracy. 


Predictor : 


*'k+l<->  •  VlW  +  h/3(8fk  -  5£k-l  +  <-2  -  fk-3> 

where  f£  -  f(xk(-),tk)  ,  f£_2  -  f (xk_„(+) ,tR_2)  and 


fk-3  “  f  (xk-3(+)  ^k^ 


Corrector : 


Xk(+)  "  *k-l(+)  +  h/24(9fk  +  19fk_l  ‘  5fk-2  +  fk-3}  +  Kk(zk-h(xk(_))) 
The  key  here  is  that  all  the  values  on  the  right  hand  side  (RHS)  of  the  above 
predictor/corrector  equations  are  available  for  simultaneous  computation  in 
parallel  (see  Figure  3-1  below) . 


FIGURE  3-1  Computational  Wavefront  of  Extended  PKF  State  Update 


3.3  DECOUPLING  THE  EXTENDED  KALMAN  FILTER  COVARIANCE  UPDATE 


Let  the  covariance  of  the  estimation  error  before  and  after  an  update  be 
denoted  as  in  Section  2.  Now  if  we  define  the  RHS  of  the  continuous  covari¬ 
ance  update  as  follows,  we  can  compute  the  extended  Parallel  Kalman  Filter 
(PKF)  covariance  in  parallel: 

P(t)  -  F(x(t))P(t)  +  P(t)FT(x(t))  +  Q(t)  P(tQ)  =  PQ 

-  G(F(x(t)),P(t),Q(t),t) 


where  F(x(t)) 


3f  (x(t)) 
9x(t) 


Note  that  G  is  a  matrix  differential  equation  that  is  nonlinearly  related 
to  P. 


With  two  (2)  processors  using  the  parallel  trapezoidal  rule,  the  extended 

2 

Kalman  filter  covariance  update  can  be  computed  with  0(h)  accuracy  as  follows 
Predictor:  Pfc+1 (-)  -  Pfc_1(+)  +  2hc£ 

where  G?  -  G(F(xk(-) ) ,Pk<-) ,Qk<-) .tfc) 

Corrector:  Pfc(+)  -  Pk_j(+)  +  h/2(c£  +  Gj^)  +  (I  -  )Pfc(-) 

where  Gk_j  “  G(F(x^_^  (+)) ,  Pk_^ (+) ,Qk_^ (+) » tk_j ) 

a  U  t*  i  w  3h(x(t.  )) 
and  H  (x, (-))  - 

3x(t.) 

x(tk)  -  xk(-) 

With  two  (2)  processors  the  covariance  predictor  and  corrector  equations 
can  be  computed  using  the  4th  order  Parallel  Adams  Moulton  Method  with  O(h^) 
accuracy  as  follows: 

Predictor:  Pk+1<->  -  Pk.,W  ♦  h/3(8G>  -  Sg'.j  +  -  <£_3> 

where  g£_x  -  G(F(ik_j (+) ) ,Pk_1 (+) ,Qkl (+) , tkl) 

Gk-2  “  G(F(xk-2(+)) ,Pk-2(+) ,Qk-2(+) ,tk-2) 

Gk-3  “  G(F(xk-3(+)),Pk-3(+),Qk-3(+)*tk-3) 

Corrector:  Pk<+)  -  P^W  +  h/24(9Gj  +  190^  -  5g£_2  +  Gk_3> 

-  KkHk(xk('»Pk(-> 

In  both  the  two  and  four  processor  extended  Parallel  Kalman  Filter 


equations,  the  Kalman  gain  is  computed  as  follows: 

Kalman  Gain:  Pk(-)HT(xk(-)*(Hk(ik(-))Pk(-)Hk(xk(-))  +  R^"1 

The  extended  parallel  Kalman  filter  algorithms  presented  in  this  section 
can  be  generalized  to  run  on  a  large  number  of  processors.  The  four  processor 
extended  PKF  based  on  the  parallel  Adams  Moulton  method  is  of  major  interest 
because  the  4th  order  parallel  Adams  Moulton  method  is  fast  (because  the  equa¬ 
tions  can  be  computed  on  four  processors  four  times  faster  than  on  a  sequential 
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computer)  and  accurate  to  0(h)  to  give  eight  digit  or  more  precision.  In 
addition,  this  method  is  well  matched  for  execution  on  a  parallel  computer  with 
four  processors  per  node  (like  the  Systolic-481  board  discussed  in  Section  4. 

A  candidate  architecture  for  evaluating  the  4th  order  parallel  Adams  Moulton 
method  is  given  in  Figure  3-2  below. 


s 


c 

i-1 


f 


c 

i-3 


FIGURE  3-2  Parallel  Architecture  for  the  4th  Order  Adams  Moulton  Method 


SECTION  4 


WORDLENGTH,  MEMORY  AND  PARALLEL  PROCESSOR  SELECTION 


Before  implementing  the  parallel  Kalman  filter  algorithms  and  architec¬ 
tures  derived  previously  in  this  report,  it  is  of  interest  to  examine  the 
wordlength,  memory  and  timing  requirements  for  these  methods.  The  timing 
requirements,  in  particular,  have  a  direct  impact  on  the  selection  of  the 
VLSI  arithmetic  processors  used  for  computation.  Memory  speed  and  bus  con¬ 
siderations  also  impact  overall  system  throughput.  To  start,  wordlength 
considerations  are  analyzed  first. 

4.1  ERRORS  IN  THE  KALMAN  GAIN 

Suppose  that  due  to  numerical  inaccuracies,  the  actual  Kalman  gain 
consists  of  two  parts: 


actual 

Kalman 

gain 


K  +  K  +  AK 

error  in  Kalman  gain 
exact  Kalman  gain 


K  +  K  +  AK 

iff 

lal  I  ex 

nan  | 

>  __  ■ _ M.  JT^-1 


Then  the  finite  wordlength  effects  of  the  mantisa  in  floating  point  arithmetic 
on  the  Kalman  filter  can  be  summarized  by  the  following  theorem: 


Theorem  4.1:  For  the  linear  Kalman  filter  in  Section  2,  given  that 

inaccuracies  exist  in  the  Kalman  gain  (i.e.,  K  =  K  +  AK) ,  the 
number  of  bits  needed  in  the  mantisa  to  ensure  stability  is 
given  by: 


where 


e  >  0  is  the  error  tolerance,  n  is  the  system  order  and 


Corollary 


Proof  s 


For  H  -  I  and  P(-)  =  I,  the  number  of  bits  needed  in  the 
mantissa  to  ensure  stability  can  be  estimated  as  follows: 

b  -  -  lo*2  (if) 

Substituting  K  *  K  +  AK  into  the  P(+)  update  we  have: 

P(+)  -  (I  -  (K  +  AK)H)  P(-)  >  e  (4.2) 

-  (I  -  KH)  P(-)  -  AKHP(-)  >  e  (4.3) 

P(+) 

->  -  AKHP(-)  >  e  (4.4) 

T 

Now  multiplying  both  sides  of  Eq.  (4.3)  by  -H  gives: 

AK  HP(-)HT  1  -  eHT  (4.5) 

scalar  >  0  if  P(-)  >  0 
T 

Since  HP(-)H  >  0  and  a  scalar,  we  can  divide  both  sides  of 
Eq.  (4.4)  by  HP(-)HT.  Hence,  we  have: 

,T 


AK  <  - 


eH 


HP (-)H1 


(4.6) 


since  the  H 


|  H | .  Now  suppose  [ AK |  £ 


>  Y  2  b»  then 


£  2"b  1  I  AK  |  i  - 

^  I  /  \  ,T  I  I 


(4.8) 


I  HP (-)H 

Taking  the  log2  of  both  sides  gives: 
-b  i  log 

2\  | HP (-)HT |  / 

or  equivalently 

b  >  -  log  M  ,  ) 

2  \  n  |hp(-)ht|  / 


(4.9) 


(4.10) 


Note  that  if  H  -  I  and  P(-)  -  I,  the  desired  result  is 
obtained  as  follows: 


e  -  10 
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Theorem  4.2: 


Corollary: 


Proof : 


For  the  Kalman  filter  in  Section  2  given  that  inaccuracies 
exist  in  the  covariance  update  (i.e.,  P(-)  =  P(-)  +  AP(-)), 
the  number  of  bits  needed  in  the  mantissa  to  ensure  stability 
is  given  by: 


b  t  -  log. 


(I  -  KH)' 


where 


e  >  0  is  the  error  tolerance,  n  is  the  system  order  and 

\a\  i  f  2'”  • 

For  KH  -*■  0  (i.e.,  in  the  steady  state)  the  number  of  bits 

needed  in  the  mantissa  to  ensure  stability  can  be  estimated  as 
follows : 


2  -  log2  (¥) 


Suppose  P(-)  *  P(-)  +  AP (-)  . 

I,  t  t  , 

actual  error  in 

covariance  covariance 

exact 

covariance 

Then  P(+)  -  (I  -  KH)P(-)  +  (I  -  KH)  AP(-)  -  e 


-*■  (I  -  KH)  AP(-)  -  e 


(4.12) 


(4.13) 


Now  multiplying  both  sides  of  Eq.  (4.13)  by  (I  -  KH)  we  have: 


AP(-)  ±  e(l  -  KH)' 


(4.14) 


Taking  the  norm  of  both  sides  of  Eq.  (4.14)  and  using  the  fact 
that  |AP(-) I  1  j  2-b  results  in: 

|  2-b  i  e | (I  -  KH)_1 |  (4.15) 

Taking  the  log-  of  both  sides  of  Eq.  (4.15)  gives: 


-b  1  log. 


(I  -  KH)' 


(4.16) 


Multiply  both  sides  of  Eq.  (4.16)  by  -1  giving  the  desired 
result : 


I 


njiRjf  n*  jvji  ruwumji  tv*  n_*rL*nir 


b  -  -  i»s2(v  l(I  -  KH)‘1|) 


Note  that  if  KH  -»■  0  in  Eq.  (4,17)  we  have: 


1  *-*  (t) 


(4.17) 


(4.18) 


Note  that  the  results  of  Theorems  4.1  and  4.2  are  very  similar  and  provide 


the  same  results  under  the  conditions  that  H  =  I,  P(-)  =  I  and  KH  -»  0. 

4.3  MEMORY  CONSIDERATIONS 

The  panllel  Kalman  filter  algorithms  and  architectures  derived  in 
Sections  2  and  3  use  decoupling  to  permit  the  predictor  and  corrector  equations 
to  be  computed  on  separate  processors.  At  any  given  time  step  k,  the  state, 
covariance,  and  measurements  must  be  stored,  as  well  as  intermediate  values 
associated  with  the  linear  PKF's  matrix/vector  calculations.  For  a  typical 
nine  state  filter,  the  operation  count  given  in  Section  1  indicates  that  the 
number  of  operations  in  the  standard  Kalman  filter  is 


additions: 


n  x  n  +  2n  -  1 


multiplications:  2n  x  n  +  4n  +  1  =  199 
divisions:  1 

when  n  *  9.  The  data  storage  requirement  is  on  the  order  of  8  bytes  x 
(98  +  199  -  1)  *  2384  bytes  or  2K  bytes  for  64-bit  precision.  If  the  linear 
two  processor  PKF  developed  in  Section  2  is  used,  it  must  be  initialized  by 
running  the  standard  SKF  for  the  first  two  (2)  time  steps.  Then  the  dual  (2) 
processor  PKF  can  be  run  at  step  3  (i.e.,  at  k  *  3).  Hence,  the  following 
values  of  x,  $,  P,  z,  K,  H  and  R  must  be  stored  in  memory. 


State  Vector  Values: 

State  Transition  Matrix  Values: 
Covariance  Matrix  Values: 

Kalman  Gain  Values: 

Others : 


Xq  (+) ,  5^+),  XQ(-),  X  ^  (  — )  ,  x2(-) 

♦o»  V  *2 

PQ(+),  Pj(+)  P^-),  P2(-> 

Kl»  K2 

Hl»  H2*  Rl»  R2 


Once  the  linear  two  processor  PKF  is  initialized,  memory  is  needed  to  store 
the  updated  values  of  x,  P,  z,  H  and  R.  Hence,  in  general,  the  over¬ 

all  memory  requirement  is: 
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Memory  *  (np  +  1)  x  (storage  requirement  of  the  standard  Kalman  filter) 

where  np  *  the  number  of  parallel  processing  elements. 

Thus,  (np  +  1)  x  (2  Kbytes)  are  needed  to  store  the  data  in  the  parallel 
filter.  With  np  ■  2,  this  corresponds  to  6  Kbytes.  With  np  =  32,  this 
corresponds  to  approximately  64K  bytes  of  RAM. 

Note  that  the  above  memory  sizing  is  for  data  only.  The  PKF  program 
memory  has  not  been  sized.  Because  the  two  processor  PKF  algorithm  can  be 
coded  with  less  than  1000  lines  of  code  and  the  compiled  version  of  a  1000 
line  program  requires  about  128  Kbytes  of  RAM  to  store,  a  reasonable  estimate 
of  the  storage  requirement  for  the  linear  PKF  program  would  be: 

PKF  Program  memory  *  np  x  128  Kbytes 

where  np  *  the  number  of  parallel  processing  elements. 

With  32  processors,  the  program  memory  is,  therefore,  estimated  to  be 
32  x  128  Kbytes  *  4  Mbytes. 

Hence,  a  parallel  processor  with  4  Mbytes  of  bulk  memory  (i.e.,  relatively 
slow  DRAM)  and  64  Kbytes  of  fast  RAM  (i.e.,  cache  memory)  should  be  capable  of 
Implementing  a  32  processor  linear  PKF. 

Because  nonlinear  function  evaluation  generally  results  in  more  inter¬ 
mediate  values  than  linear  matrix/vector  operations,  the  amount  of  local  data 
storage  might  be  increased  by  a  factor  of  four.  Hence,  4  x  64K  ■  256K  of 
fast  RAM  is  recommended  for  nonlinear  extended  parallel  Kalman  filter  data 
storage.  Four  Mbytes  of  program  memory  should  be  sufficient,  however,  for  the 
nonlinear  PKF. 

4.4  PARALLEL  PROCESSOR  SELECTION 

One  method  of  estimating  the  computational  requirements  for  the  parallel 
Kalman  filter  is  to  total  the  number  of  additions,  multiplications  and  divi¬ 
sions  needed  to  complete  one  cycle  of  the  Kalman  filter  algorithm.  For  example, 
the  simple  Kalman  filter  algorithm  defined  in  Section  1  requires  only  98 
additions,  199  multiplications  and  1  division  per  cycle  for  a  nine-state  filter. 
Hence,  at  100  cycles  per  second  (i.e.,  100  Hz  sample  rate)  the  number  of 
arithmetic  operations  is  given  by  100x298  ■  29,800  operations  per  second. 
Ideally,  a  microprocessor  capable  of  33.1  u sec  per  operation  is  all  that  is 
needed  to  implement  a  nine-state  filter.  Hence,  a  single  Motorola  68020/68881 


pair  can  easily  handle  the  computational  requirements  of  the  Kalman  filter 
assuming  100%  efficiency.  Note  that  33.6  ysec  per  pass  through  the  filter 
corresponds  to  an  update  rate  of  2,800  samples  per  second. 

Typically,  however,  only  10  to  30%  of  peak  performance  is  achieved  in 
practice  due  to  data  bus  and  memory  access  time  restrictions.  Therefore,  one 
target  may  be  updated  at  a  4000  updates  per  second  rate.  100  targets  may  be 
updated  at  a  4  Hz  rate.  10,000  targets  at  a  0.4  Hz  rate  (every  2.5  seconds). 
For  nonlinear  filtering,  typical  of  SDI  target  tracking  problems,  64-bit 
precision  and  the  need  to  compute  trigonometric  functions  for  coordinate 
transformations  can  slow  computations  down  by  one  or  perhaps  two  orders  of 
magnitude  (lOx  to  lOOx) . 

Since  it  is  well  known  that  Kalman  filtering  must  be  performed  using 
floating-point  arithmetic  to  avoid  stability  problems,  the  only  viable  method 
to  gain  back  the  throughput  for  nonlinear  SDI  filtering  problems  using  an 
extended  Kalman  filter  is  with  parallel  processing.  Optical  processing  is 
fast  but  optical  fixed  point  can  cause  stability  problems  with  the  Kalman 
filter.  Therefore,  to  rapidly  implement  the  parallel  Kalman  filter  with 
32/64-bit  floating-point  precision,  an  aggregate  computation  rate  of  10,000  x 
29,800  *  298  million  operations  per  second  is  needed  to  track  10,000  targets 
simultaneously.  A  parallel  processing  system  of  16  x  16  =  256  processors 
needs  a  computation  rate  of  1.17  MFLOPs  per  processor  to  perform  the  necessary 
computations.  Using  four  (4)  25  MHz  Motorola  68881  math  coprocessors  per 
board,  1.26  MFLOP  performance  is  readily  achievable.  Hence,  with  256  boards 
it  it  feasible  to  track  10,000  targets  in  real  time. 

The  general-purpose  nature  of  the  Motorola  68020/68881  processors  is  well 
suited  for  nonlinear,  as  well  as  linear,  Kalman  filtering.  In  particular 
because  trigonometric  functions  (sin,  cos,  tag,  etc.)  and  square  roots 
commonly  occur  in  coordinate  transformations  associated  with  SDI  sensor 
processing,  high-speed  general-purpose  hardware  (such  as  the  Systolic-481 
parallel  numeric  processor)  is  needed  to  handle  the  throughput  requirements 
(see  Figure  4-2) . 

The  Systolic-481  is  a  Vme  bus  compatible  parallel  numeric  processor  board 
capable  of  full  IEEE-P754  standard  32-bit,  64-bit  and  80-bit  floating-point 
computations.  The  Systolic-481  contains  one  (1)  Motorola  68020.  four  (4) 


I 

I 


233.35  mm  x  160  nan  board.  Multiple  481  boards  can  be  installed  into  a  Vine  bus 
system  for  additional  performance  gains  approaching  a  CRAY  supercomputer.  The 
advantage  of  the  Systolic-481,  however,  is  that  it  is  very  compact.  A 
scientific  software  library  of  more  than  400  commonly-used  subroutines  (over 
200  separate  functions)  is  available  for  the  Systolic-481  simplifying  software 
development.  Because  the  Systolic-481  rapidly  performs  linear,  as  well  as 
nonlinear,  transcendental  and  trigonometric  functions,  it  is  well  matched  for 
SDI  battle  management  computations. 


FIGURE  4-2  Systolic-481  Vme  Bus  Architecture 


I 


Double  Precision  Floating-Point  Hardware; 

One  Motorola  68020  and  four  Motorola  68881  math  coprocessors  on  board 
Processor  Speeds:  16.67  MHz,  20.0  MHz  and  25.0  MHz 

Floating-Point  Data  Format; 

IEEE-P754  standard 

32-bit  single  precision,  8-bit  signed  exponent,  23-bit  mantissa 

64-bit  single  precision,  11-bit  signed  exponent,  52-bit  mantissa 

80-bit  single  precision,  15-bit  signed  exponent,  64-bit  mantissa 

Integer  Arithmetic  Formats; 

8-bit  byte  integer 
16-bit  word  integer 
32-bit  long  integer 

Data  Transfer; 

Data  transfers  conform  to  Vme  bus  protocol  and  performance.  External  DMA 
transfer  at  120  nsec  per  cycle  results  in  a  66.7  Mbyte  per  second  transfer 
rate. 

On  Board  Memory; 

256  Kbytes,  64K  x  32-bit,  70  ns  static  RAM 
Data  Registers; 

Four  stacks  of  8  80-bit  data  registers  (32  total) 

Stack  Pointers; 

Two  32-bit  pointers 

Constants: 

22  on-chip  ROM  constants 
Support  Functions: 

Full  set  of  trigonometric  and  transcendental  functions 
Bus  Interface: 

VME,  24-bit  address  (16  Mbytes),  16/32-bit  data  transfer 

High-Speed  Auxiliary  Port: 

32-bit  bus  on  Vmx  port 

Board  Size; 

233.35am  x  160.00mm  (optional  extension  bracket  to  233.35mm  to  220mm) 

Power  Requirements; 

♦5V,  5amps 


*  Multiple  481  boards  can  be  operated  in  parallel  if  additional  computational 
power  is  needed . 
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Table  4-2  :  Svstolic-481  Double  Precision  Benchmarks  (One  Board) 
64-bit  Floating  Point 


Addition . ...1.08  usee 

Subtraction . 1.08  usee 

Multiplication . 1.36  usee 

Division . 1.81  usee 

Square  Root . ....1.84  usee 

Sine . 6.62  usee 

Cosine . 6.62  usee 

Tangent . 6.93  usee 

Exponential . 7.26  usee 

Logorithm . 7.65  usee 


Note: 

1. )  Assumes  16.667  MHz  clock  and  4  Motorola  68881  math  coprocessors  are 
operating  in  parallel  at  peak  performance.  With  a  25  MHz  clock,  computation 
time  is  reduced  by  33%.  For  example,  the  64-bit  sine  time  would  be  only  4.4 
usee  running  at  25  MHz. 

2. )  The  proposed  testbed  includes  8  Systolic-481  boards  which  can  operate 
simultaneously  in  parallel.  With  8  boards  the  effective  computation  time  of 
the  system  is  1/8  of  the  above.  For  example,  with  a  25  MHz  clock,  a  64-bit 
sine  value  could  be  computed  in  4.4  usec/8  *  0.55  usee  or  550  nsec.  This 
assumes,  of  course,  that  at  least  4  sine  values  need  to  be  computed  as  is  the 
case  in  coordinate  transformations  during  SDI  measurement  processing.  Thus, 
the  32-processor  testbed  compares  favorably  with  a  CRAY  supercomputer. 


SECTION  5 


APPLICATION  OF  THE  PKF  TO  SDI  SENSOR  TRACK  PROCESSING 


The  SDI  sensor  processing  problem  can  be  partitioned  into  a  set  of  simpler 
tasks  which  when  "chained"  together  provide  information  to  the  battle  manager. 
Figure  5-1  illustrates  the  major  parts  of  the  sensor  data  processing  required 
by  the  SDI  program.  Although  each  major  block  in  Figure  5-1  can  benefit  from 
high-speed  computation,  this  report  is  concerned  with  scan-to-scan  correlation 
and  track  processing  since  Kalman  filtering  is  generally  required  for  precision 
tracking.  Thus,  the  remainder  of  this  section  formulates  the  SDI  track 
processing  problem  as  it  applies  to  the  parallel  Kalman  filtering  algorithms 
and  architectures  developed  under  our  Phase  I  SBIR  effort. 

5.1  BACKGROUND 

To  show  the  effectiveness/payoff  of  the  proposed  research  it  is  important 
to  consider  a  meaningful  SDI  problem.  The  SDI  problem  should  be  representative 
of  typical  ballistic  missile  applications  and  serve  as  a  baseline  to  measure 
the  benefits/accuracy  of  the  Phase  I  SBIR  parallel  Kalman  filter  technology. 

With  this  in  mind,  the  following  candidate  test  problem  is  recommended. 

Although  this  test  problem  is  relatively  simple,  it  illustrates  the  computations 
which  arise  in  SDI  sensor  track  processing. 

A  Simple  Test  Problem 

Consider  the  problem  of  estimating  the  position  of  an  object  (missile) 
from  angle-only  (or  range)  measurements.  The  geometry  of  this  two-dimensional 
tracking  problem  is  illustrated  in  Figure  5-2.  Typical  parameter  values  for 
this  exercise  are  given  in  Table  5-1. 

In  Figure  5.1,  Xt,  Yt  represents  the  target  position  in  X-Y  coordinates 
and  Yg  represents  the  sensor  position. 


The  target  range  is  given  by 
Zt  -  (X2  +  (Yt  -  Yg)2)1* 


(5.1) 
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FIGURE  5-1  Geometry  of  a  Two-Dimensional  SDI  Tracking  Problem 


TABLE  5-1  PARAMETER  VALUES  FOR  THE  ONE- DIMENSIONAL  TRACKING  PROBLEM 

p  -  3.4  X  l(f3lb  sec2/ ft4 
o 


g  -  32.2  ft/sec 


k  -  22000  ft 
P 


PU  -  500  ft 
o 


4  2  2 

2  X  10  ft  /sec 


B  -  N(2000  lb/ft2,  2.5  X  105lb2/ft4)  P33  -  2.5  X  105  lb2/ft4 

o 


0.05  lb/ft3  *<0>  -  i<0)  -  3  X  105  ft 

2  4 

x (0)  -  x (0)  *  2  X  10  ft/sec 

x3(0)  -  2  X  10-2  ft2/lb 
ft  (0)  -  6  X  10~4  ft2/lb 

With  angle-only  measurements  the  line-of-sight  angle,  0,  can  be  estimated  as 


follows : 


0  -  tan 


4(V) 


(5.2) 


The  target's  motion  is  modeled  as  a  falling  body  in  state  variable  form  as: 


where  3  is  the  so-called  ballistic  coefficient  of  the  missile  and  Y  is 
the  target's  height  above  the  earth. 

The  equations  of  motion  for  the  body  are: 


P  «  PQe 


"xl/kc 


(5.4) 


(5.5) 


where  d  is  drag  deceleration,  g  is  acceleration  of  gravity,  p  is  atmos¬ 
pheric  density  (with  pn  the  atmospheric  density  at  sea  level)  and  k  is  a 

u  p 

decay  constant.  The  differential  equation  for  velocity,  is  nonlinear 

through  the  dependence  of  drag  on  velocity,  air  density  a  ballistic  coefficient. 

Initial  values  of  the  state  variables  are  assumed  to  mean,  n,  and 
covariance  matrix  of  the  form 


(5.6) 


The  problem  of  estimating  all  the  state  variables  may  be  solved  using  an 
extended  Kalman  filter. 

This  SDI  test  problem  illustrates  the  class  of  computations  required  for 
SDI  target  tracking.  Squares,  divides,  square  roots  and  trigonometric  func¬ 
tions  (sine,  cosine,  tangent,  etc.)  are  needed.  In  addition,  the  extended 
Kalman  filter  requires  the  solution  of  nonlinear  ordinary  differential  equa¬ 
tions.  Thus,  high-speed  nonlinear  function  evaluation  is  important  to  SDI 
track  processing.  This  problem,  although  simple,  can  be  solved  relatively 
easily  to  provide  a  known  solution  to  verify  the  parallel  Kalman  filter 
algorithms  and  architectures  developed  under  this  Phase  I  SBIR  effort.  This 
test  problem  can  be  expanded  to  three  dimensions  and  angle-only  measurements 


of  target  and  sensor  position.  In  this  case,  the  SDI  track  processing 
problem  becomes  more  nonlinear  and  involves  additional  trig  functions  to  be 
computed  during  coordinate  transformations.  The  geometry  for  this  case  is 
discussed  in  the  next  section. 


5.2  SDI  PROBLEM  FORMULATION  AND  ASPHERICAL  EARTH  MATH  MODEL 

The  Kalman  filter  can  be  used  to  update  the  state  estimate  of  a  ballistic 
trajectory  with  angle  only  measurements.  The  nonlinear  relationship  between 
the  state  and  the  measurements,  and  the  nonlinear  dynamics  of  a  ballistic 

trajectory,  usually  require  the  state  estimate  to  be  improved  iteratively  with 

a  given  measurement  set.  The  problems  due  to  the  nonlinearities  become  more 
difficult  when  the  observer  is  free-falling  and  more  difficult  still  if  the 
observer  is  located  in  the  plane  of  the  observed  trajectory. 

The  first  step  in  the  Kalman  filter  equations  is  to  project  the  state 
estimate  to  the  time  of  the  current  measurement.  This  can  be  accomplished 

using  the  so-called  f  and  g  series,  which  can  be  found  in  Escobal  (1975). 

The  f  and  g  series  are  derived  by  Taylor  series  expansion  about  the 
current  target  position.  The  target  position  at  time  t  is  given  by 


X  =  f  X 


n-1 


a  x 

B  n- 


1 


(5.7) 


where 


n-1 


n-1 


is  the  current  ECI  target  position  (x,y,z)  at  the  time  of  the 

last  measurement,  t  , 

*  n-1 

•  •  • 

is  the  current  target  velocity  (x,y,z) 


f 

g 


t  “  t  .  ) 
n  n-1 


8  (Xn-l*  Xn-1»  t"  "  t— ^ 


n-1' 


The  target  velocity  at  time  tfl  is  given  by 


X  -  f  X 


n-1 


+  g  X, 


'n-1 


(5.8) 


The  f  and  g  series  given  in  Escobal  is  based  on  a  spherical  earth. 

Over  periods  of  time  of  several  hundred  seconds  this  may  introduce  significant 
bias  errors.  Series  approximation  of  the  equations  of  motion  over  an  aspherical 
earth  have  been  derived  but  not  implemented. 


An  alternative  to  f  and  g  series  projection  of  the  state  is 
numerical  integration  of  the  equations  of  motion.  A  mathematical  model, 
given  in  Escobal  (1975),  which  accounts  for  an  aspherical  earth  is  given  in 
the  next  section. 

5.2.1  SD1  Target  Model 


••-sHM'-’ffij) 

--isHM'-’W')) 


where 


u  is  the  gravitational  constant  x  mass  of  the  earth  =  3.988  x  10J 
R  =  (X2  +  Y2  +  Z 2Y*  in  meters 

J  is  the  first  harmonic  of  the  earth’s  gravitational  potential  * 
4.4028  x  1010  . 


5.2.2  SDI  Sensor  Measurement  Model 


((XT  -  Xg)  +  (Yt  -  Yx)“) “  j 


where 


X^,,  Yt,  is  the  ECI  target  position 
Xg,  Yg,  Zg  is  the  ECI  sensor  position  . 

In  view  of  the  problem  formulation  presented  in  this  section,  it  is  clear 
that  SDI  sensor  track  processing  is  inherently  nonlinear.  Squares,  square 
roots,  sines,  cosines,  inverse  tangents  and  other  nonlinear  scalar  computations 
characterize  the  target  state  and  measurement  models.  These  nonlinear  models 
can  be  linearized  about  a  "nominal"  trajectory  and  a  linear  parallel  Kalman 
filter  used  to  approximate  the  solution. 
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TRANSFORMATION  OF  MEASUREMENTS  TO  ECI  (Wilcox,  1986) 
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SECTION  6 


CONCLUSIONS  AND  RECOMMENDATIONS 


6.1  CONCLUSIONS 

Based  on  the  results  of  our  Phase  I  study,  the  following  conclusions  can 
be  drawn: 

a.  It  is  technically  feasible  to  decouple  the  predictor  and 
corrector  equations  in  a  standard  Kalman  filter  for  parallel 
processing  on  multiple  processors. 

b.  The  decoupling  principle  allows  the  parallel  Kalman  filter's 
predictor  and  corrector  equations  to  be  computed  on  separate 
processors  improving  computational  speed  directly  propor¬ 
tional  to  the  number  of  available  processing  elements. 

c.  The  parallel  Kalman  filter  (PKF)  is  "optimal"  in  the  same  sense 
as  the  standard  Kalman  filter  (SKF)  since  it  was  shown  that  the 
PKF  is  "mathematically  equivalent"  to  the  SKF  (i.e.,  the  recur¬ 
sive  updates  generated  by  the  PKF  are  identical  to  the  recur¬ 
sive  updates  of  the  SKF  when  combined  in  the  proper  fashion) . 

d.  Parallel  architectures,  based  on  systolic  array  principles, 
have  been  developed  that  are  100%  efficient  when  coefficient 
reordering  is  employed. 

e.  It  is  feasible  to  extend  the  linear  PKF  theory  to  nonlinear 
target  tracking  and  estimation  problems  allowing  an  extended 
Kalman  filter  to  run  on  multiple  parallel  processors. 


In  summary,  it  can  be  concluded  that  parallel  Kalman  filtering  based  on 
decoupling  the  filter's  predictor  and  corrector  equations  is  feasible.  Both 
linear  and  nonlinear  filtering  can  benefit  from  this  unique  approach.  Hence, 
this  research  activity  appears  well  suited  for  transition  to  the  Phase  II 
stage  of  the  SBIR  program. 

6.2  RECOMMENDATIONS 

Based  on  the  conclusions  derived  from  our  Phase  I  results,  the  following 
recommendations  are  presented: 


Further  expand  the  PKF  theory  for  nonlinear  filtering  and 
estimation.  Because  the  SDI  target  tracking  problem  tends  to 
be  nonlinear,  it  is  anticipated  that  target  trajectory  estima¬ 
tion  accuracy  can  be  substantially  improved  using  the  nonlinear 
equations  directly. 

Code,  simulate  and  evaluate  the  PKF  algorithms  on  a  parallel 
computer  whose  architecture  can  be  reconfigured  to  validate 
newly  developed  SDI  algorithms  and  architectures.  Although  the 
PKF  algorithms  have  been  analytically  shown  to  be  optimal  and 
stable,  many  issues  regarding  the  implementation  of  the  parallel 
filter  can  be  learned  by  coding  and  simulating  the  PKF  algorithms 
and  architectures.  For  example,  timing,  synchronization,  drift, 
potential  divergence  of  the  error  covariance  update,  model  sensi¬ 
tivities  could  have  a  major  impact  on  the  ultimate  application  of 
the  PKF.  Hence,  it  is  recommended  that  an  expert  system  be 
developed  to  manage  PKF  computations.  The  knowledge  base  of  the 
expert  system  could  be  based  on  mathematically  sound  "rules"  such 
as  monitoring  the  positive  def initieness  of  the  error  covariance 
matrix. 

Create  a  Battle  Management  Testbed  Facility  (based  on  industry- 
standard  hardware  and  software).  A  f lexible/reconf igurable 
parallel  processing  testbed  is  recommended  to  rapidly  test  and 
evaluate  the  performance  of  newly  developed  SDI  parallel  pro¬ 
cessing  algorithms  and  architectures.  Because  SDI  track  proces¬ 
sing  tends  to  be  a  very  large  nonlinear  filtering  problem,  a 
scalable  architecture  (i.e.,  expandable  based  on  problem  size) 
for  nonlinear  function  evaluation  is  recommended.  General- 
purpose  microprocessor /coprocessor  technology  augmented  by  a 
programmable  finite  state  machine  is  recommended  to  accommodate 
a  wide  class  of  parallel  algorithms.  Four  processors  per  card 
are  recommended  to  simultaneously  compute  the  equations  in  the 
decoupled  PKF  (i.e.,  two  processors  for  the  predictor  and  two 
processors  for  the  corrector  per  card) .  Multiple  cards  (say 
eight  (8))  can  be  installed  in  the  testbed  to  validate  essential¬ 
ly  any  parallel  algorithm  and  architecture.  An  industry-standard 
Vme  bus  is  also  recommended  for  several  reasons:  1)  Vme  is  a 
very  high-performance  bus,  2)  Vme  is  supported  by  several  major 
companies  allowing  the  government  to  add  "special  function"  cards 
to  the  system,  and  3)  Vme  is  also  standard  in  high-431,  milspec 
and  ruggedized  systems  for  actual  field  test  of  our  PKF  technology. 

Select  a  realistic  SDI  problem  to  show  the  benefits  of  the  PKF 
technology.  Because  of  the  size  and  complexity  of  realistic  SDI 
target  tracking  applications,  it  is  anticipated  that  even  today’s 
supercomputer  architectures  will  not  be  capable  of  solving  these 
problems  in  near  real  time.  Due  to  the  unique  matching  of  the 
PKF  algorithms  and  architectures,  it  is  anticipated  that  problems 
that  could  not  be  solved  otherwise  in  a  reasonable  time  (at  a 
reasonable  cost)  can  be  solved  on  the  proposed  testbed.  Thus,  it 
is  recommended  that  a  target  tracking  problem  of  major  signifi- 
ance  to  the  SDI  program  be  solved  and  benchmark  performance  docu¬ 
mented  so  that  future  designs  can  be  compared.  Due  to  the 


1  * 


•  »*  «'»  I 


a||  ■  af.»  #.*«»»  f.o  ■ 


"special"  architecture  of  the  Battle  Management  testbed  it  is 
anticipated  that  it  can  be  the  standard  to  improve  upon  for  the 
next  five  (5)  years. 


6.3  SUMMARY 

-  ^ 

Based  on  the  results  in  this  report  it  is  clear  that  the  PKF  theory  is 

well  developed,  mature  and  ready  to  proceed  to  full-scale  validation  on  a 
parallel  processing  testbed.  Because  the  PKF  technology  has  been  needed  to 
solve  several  applications  in  the  DoD  for  more  than  a  decade,  it  is  antici¬ 
pated  that  once  fully  developed  this  technology  can  benefit  several  sectors 
of  the  DoD.  This  is  possible  because  the  necessary  technology  has  only 
recently  been  available  to  transition  the  PKF  theory  into  practice.  Hence, 
Systolic  Systems  would  be  pleased  to  continue  this  program  under  Phase  II 
of  the  SBIR  program. 
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